Skip to content

feat(rewriter): instruction-level offset map (#143 DWARF Phase 2 inc 2)#202

Merged
avrabe merged 1 commit into
mainfrom
feat/dwarf-phase2-inc2-rewriter-offset-map
May 28, 2026
Merged

feat(rewriter): instruction-level offset map (#143 DWARF Phase 2 inc 2)#202
avrabe merged 1 commit into
mainfrom
feat/dwarf-phase2-inc2-rewriter-offset-map

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 28, 2026

Summary

Second increment of #143 DWARF Phase 2. meld's rewriter changes operand values (function/global/table/memory/type indices) whose LEB128 encodings shift byte length, so intra-function byte offsets drift during rewriting — .debug_line programs can't be remapped by function-base relocation alone. This adds the instruction-level old→new offset map that captures that drift.

Changes

  • InstrOffset { old, new } + InstrOffsetMap { entries } with translate(old) -> Option<new>
  • rewrite_function_body_with_offsets — parallel entry point returning (Function, InstrOffsetMap), sharing a private core with the existing rewrite_function_body (unchanged signature, zero caller churn in merger.rs)
  • OLD positions from reader.into_iter_with_offsets(); NEW positions from per-instruction encoded-length measurement (identical bytes to Function::instruction, both via wasm_encoder::Encode)
  • opt-in: the plain path pays nothing and emits byte-identical code

Why offsets are relative to the instruction stream

Both old and new are relative to the start of the function body's instruction stream (the byte after the locals vector). Increment 3 composes these intra-function offsets with the per-function base from the v0.16.0 component-provenance v2 code_range to translate DWARF code addresses input→fused.

Tests (4 new)

  • instr_offset_map_tracks_leb_growth_from_index_remapcall 0call 200 grows the operand LEB 1→2 bytes; divergence accumulates exactly [0,1,1,2,2,2] across call/drop/call/drop/const/end
  • instr_offset_map_is_identity_when_no_leb_length_change — 0→1 keeps a 1-byte LEB; new == old everywhere
  • instr_offset_map_translate_hits_and_misses
  • with_offsets_emits_identical_function_bytes — offset collection must not perturb emitted code

No new LS-N

Pure infrastructure; output is byte-identical (proven). The wrong-DWARF-address hazard materializes only when increment 3 consumes this map — its LS-N lands there.

Test plan

  • 4 new rewriter tests green
  • 295 lib tests green, clippy + fmt clean
  • CI green + Mythos AI scan on rewriter.rs (Tier-5)

🤖 Generated with Claude Code

Second increment of DWARF Phase 2. The rewriter changes operand
values (function/global/table/memory/type indices) whose LEB128
encodings shift byte length, so intra-function byte offsets drift
during rewriting — DWARF .debug_line programs cannot be remapped by
function-base relocation alone. This adds the instruction-level
old->new offset map that captures that drift.

  - InstrOffset { old, new } + InstrOffsetMap { entries } with a
    translate(old) -> Option<new> lookup
  - rewrite_function_body_with_offsets: parallel entry point that
    returns (Function, InstrOffsetMap); shares a private core with
    the existing rewrite_function_body (unchanged signature, zero
    caller churn in merger.rs)
  - offsets collected via reader.into_iter_with_offsets() for OLD
    positions + per-instruction encoded-length measurement for NEW
    positions (identical bytes to Function::instruction, via
    wasm_encoder::Encode)
  - opt-in: plain path pays nothing; output is byte-identical whether
    or not offsets are collected (pinned by a test)

4 new tests:
  - instr_offset_map_tracks_leb_growth_from_index_remap: call 0->200
    grows the operand LEB 1->2 bytes; divergence accumulates exactly
    [0,1,1,2,2,2] across call/drop/call/drop/const/end
  - instr_offset_map_is_identity_when_no_leb_length_change: 0->1 keeps
    1-byte LEB, new == old everywhere
  - instr_offset_map_translate_hits_and_misses
  - with_offsets_emits_identical_function_bytes

No new LS-N: pure infrastructure, output byte-identical, no new
hazard surface. The wrong-DWARF-address hazard materializes only when
increment 3 consumes this map; its LS-N lands there.

Increment 3 (gimli .debug_line/.debug_info rewrite) composes this
intra-function map with the per-function base from the v0.16.0
component-provenance v2 code_range.

295 lib tests green, clippy + fmt clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Mythos delta-pass required

This PR modifies one or more Tier-5 source files (per
scripts/mythos/rank.md):

meld-core/src/rewriter.rs

Before merge, run the Mythos discover protocol on the
modified Tier-5 files:

  1. Follow scripts/mythos/discover.md
    — one fresh agent session per touched Tier-5 file.
  2. For each finding, the agent must produce both a Kani
    harness and a failing PoC test (per the protocol's
    "if you cannot produce both, do not report" rule).
  3. Attach a comment on this PR with either the findings
    (formatted per discover.md's output schema) or
    NO FINDINGS.
  4. Add the mythos-pass-done label to this PR.

Why this gate exists: LS-A-10
(CABI alignment padding in async-lift retptr writeback) was
found by the v0.8.0 pre-release Mythos pass — but it had
lived in the callback emitter since #128, across six
releases. A PR-time gate would have caught it at review
time instead of at the release boundary.

The gate check on this PR will pass once the label is
applied.

@github-actions
Copy link
Copy Markdown

LS-N verification gate

⚠️ 35/37 verified — 2 missing regression tests

count
Passed (≥1 test, all green) 35
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 2

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests
  • LS-R-13
  • LS-M-6

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

@github-actions
Copy link
Copy Markdown

Mythos delta-pass (auto)

NO FINDINGS across 1 Tier-5 file(s)

File Verdict Hypothesis
`` ✅ NO FINDINGS

Auto-run via anthropics/claude-code-action@v1
(SHA-pinned) on the touched Tier-5 files, using the
maintainer's Max-plan OAuth token. See
.github/workflows/mythos-auto.yml and
scripts/mythos/discover.md.

@github-actions github-actions Bot added the mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR label May 28, 2026
@avrabe avrabe merged commit 2d74483 into main May 28, 2026
13 of 14 checks passed
@avrabe avrabe deleted the feat/dwarf-phase2-inc2-rewriter-offset-map branch May 28, 2026 20:34
@avrabe avrabe mentioned this pull request May 29, 2026
4 tasks
avrabe added a commit that referenced this pull request May 29, 2026
DWARF Phase 2 increment 2 (#143, #202): rewriter instruction-level
offset map — rewrite_function_body_with_offsets + InstrOffsetMap
capture the intra-function byte drift caused by LEB128 operand-length
changes during index remapping. The second of two anchors DWARF
address remapping needs (increment 1 / v0.16.0 supplied the per-
function base via component-provenance v2 code ranges).

Increment 3 (gimli .debug_line/.debug_info rewrite composing both
maps) follows in a later release.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant